Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 25751 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.1 MiB |
| Average record size in memory | 125.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 2 |
alert_key is highly correlated with date | High correlation |
date is highly correlated with alert_key | High correlation |
tx_time is highly correlated with amt | High correlation |
amt is highly correlated with tx_time | High correlation |
amt is highly skewed (γ1 = 82.57466783) | Skewed |
alert_key has unique values | Unique |
total_asset has 3120 (12.1%) zeros | Zeros |
tx_time has 8142 (31.6%) zeros | Zeros |
amt has 6043 (23.5%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-17 15:11:37.165880 |
|---|---|
| Analysis finished | 2022-12-17 15:12:01.738154 |
| Duration | 24.57 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 25751 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 265685.6269 |
| Minimum | 171142 |
|---|---|
| Maximum | 365073 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | 171142 |
|---|---|
| 5-th percentile | 176967.5 |
| Q1 | 212536 |
| median | 266346 |
| Q3 | 316658.5 |
| 95-th percentile | 356314.5 |
| Maximum | 365073 |
| Range | 193931 |
| Interquartile range (IQR) | 104122.5 |
Descriptive statistics
| Standard deviation | 58623.84087 |
|---|---|
| Coefficient of variation (CV) | 0.2206511566 |
| Kurtosis | -1.273665901 |
| Mean | 265685.6269 |
| Median Absolute Deviation (MAD) | 51941 |
| Skewness | -0.008439744393 |
| Sum | 6841670579 |
| Variance | 3436754718 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 352249 | 1 | < 0.1% |
| 288385 | 1 | < 0.1% |
| 288417 | 1 | < 0.1% |
| 288413 | 1 | < 0.1% |
| 288404 | 1 | < 0.1% |
| 288403 | 1 | < 0.1% |
| 288401 | 1 | < 0.1% |
| 288399 | 1 | < 0.1% |
| 288397 | 1 | < 0.1% |
| 288395 | 1 | < 0.1% |
| Other values (25741) | 25741 |
| Value | Count | Frequency (%) |
| 171142 | 1 | |
| 171152 | 1 | |
| 171177 | 1 | |
| 171178 | 1 | |
| 171180 | 1 | |
| 171181 | 1 | |
| 171189 | 1 | |
| 171192 | 1 | |
| 171197 | 1 | |
| 171200 | 1 |
| Value | Count | Frequency (%) |
| 365073 | 1 | |
| 365009 | 1 | |
| 365008 | 1 | |
| 365004 | 1 | |
| 365001 | 1 | |
| 364999 | 1 | |
| 364996 | 1 | |
| 364995 | 1 | |
| 364994 | 1 | |
| 364993 | 1 |
cust_id
Real number (ℝ≥0)
| Distinct | 7708 |
|---|---|
| Distinct (%) | 29.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3694.211137 |
| Minimum | 0 |
|---|---|
| Maximum | 7707 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 301.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 268.5 |
| Q1 | 1761 |
| median | 3685 |
| Q3 | 5489 |
| 95-th percentile | 7291 |
| Maximum | 7707 |
| Range | 7707 |
| Interquartile range (IQR) | 3728 |
Descriptive statistics
| Standard deviation | 2226.914988 |
|---|---|
| Coefficient of variation (CV) | 0.6028120498 |
| Kurtosis | -1.144245677 |
| Mean | 3694.211137 |
| Median Absolute Deviation (MAD) | 1838 |
| Skewness | 0.05270535689 |
| Sum | 95129631 |
| Variance | 4959150.364 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 120 | 246 | 1.0% |
| 540 | 164 | 0.6% |
| 255 | 158 | 0.6% |
| 2779 | 150 | 0.6% |
| 7637 | 142 | 0.6% |
| 1179 | 141 | 0.5% |
| 3782 | 131 | 0.5% |
| 1898 | 118 | 0.5% |
| 272 | 118 | 0.5% |
| 1057 | 117 | 0.5% |
| Other values (7698) | 24266 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 1 | < 0.1% |
| 2 | 3 | < 0.1% |
| 3 | 4 | < 0.1% |
| 4 | 6 | < 0.1% |
| 5 | 43 | |
| 6 | 1 | < 0.1% |
| 7 | 1 | < 0.1% |
| 8 | 12 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 7707 | 1 | < 0.1% |
| 7706 | 3 | < 0.1% |
| 7705 | 1 | < 0.1% |
| 7704 | 30 | |
| 7703 | 1 | < 0.1% |
| 7702 | 1 | < 0.1% |
| 7701 | 1 | < 0.1% |
| 7700 | 1 | < 0.1% |
| 7699 | 1 | < 0.1% |
| 7698 | 1 | < 0.1% |
risk_rank
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 402.4 KiB |
| 1 | |
|---|---|
| 3 | |
| 2 | 891 |
| 0 | 64 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 25751 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 3 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 17348 | |
| 3 | 7448 | |
| 2 | 891 | 3.5% |
| 0 | 64 | 0.2% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 17348 | |
| 3 | 7448 | |
| 2 | 891 | 3.5% |
| 0 | 64 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 17348 | |
| 3 | 7448 | |
| 2 | 891 | 3.5% |
| 0 | 64 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 25751 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 17348 | |
| 3 | 7448 | |
| 2 | 891 | 3.5% |
| 0 | 64 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 25751 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 17348 | |
| 3 | 7448 | |
| 2 | 891 | 3.5% |
| 0 | 64 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 25751 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 17348 | |
| 3 | 7448 | |
| 2 | 891 | 3.5% |
| 0 | 64 | 0.2% |
occupation_code
Real number (ℝ≥0)
| Distinct | 21 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.18686653 |
| Minimum | 0 |
|---|---|
| Maximum | 20 |
| Zeros | 118 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 12 |
| median | 15 |
| Q3 | 19 |
| 95-th percentile | 19 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.775775434 |
|---|---|
| Coefficient of variation (CV) | 0.3366335635 |
| Kurtosis | -0.08229778166 |
| Mean | 14.18686653 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.8265953145 |
| Sum | 365326 |
| Variance | 22.808031 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=21)
| Value | Count | Frequency (%) |
| 19 | 6305 | |
| 12 | 5047 | |
| 17 | 4110 | |
| 9 | 2464 | 9.6% |
| 18 | 1236 | 4.8% |
| 15 | 1006 | 3.9% |
| 13 | 979 | 3.8% |
| 5 | 946 | 3.7% |
| 20 | 704 | 2.7% |
| 14 | 635 | 2.5% |
| Other values (11) | 2319 | 9.0% |
| Value | Count | Frequency (%) |
| 0 | 118 | 0.5% |
| 1 | 220 | 0.9% |
| 2 | 130 | 0.5% |
| 3 | 314 | 1.2% |
| 4 | 555 | 2.2% |
| 5 | 946 | 3.7% |
| 6 | 1 | < 0.1% |
| 7 | 103 | 0.4% |
| 8 | 135 | 0.5% |
| 9 | 2464 |
| Value | Count | Frequency (%) |
| 20 | 704 | 2.7% |
| 19 | 6305 | |
| 18 | 1236 | 4.8% |
| 17 | 4110 | |
| 16 | 321 | 1.2% |
| 15 | 1006 | 3.9% |
| 14 | 635 | 2.5% |
| 13 | 979 | 3.8% |
| 12 | 5047 | |
| 11 | 351 | 1.4% |
| Distinct | 8073 |
|---|---|
| Distinct (%) | 31.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 713742.6661 |
| Minimum | 0 |
|---|---|
| Maximum | 73863211 |
| Zeros | 3120 |
| Zeros (%) | 12.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 7508 |
| median | 128880 |
| Q3 | 597231.5 |
| 95-th percentile | 2717416 |
| Maximum | 73863211 |
| Range | 73863211 |
| Interquartile range (IQR) | 589723.5 |
Descriptive statistics
| Standard deviation | 2435460.555 |
|---|---|
| Coefficient of variation (CV) | 3.412238991 |
| Kurtosis | 231.2987476 |
| Mean | 713742.6661 |
| Median Absolute Deviation (MAD) | 128880 |
| Skewness | 12.73843401 |
| Sum | 1.83795874 × 1010 |
| Variance | 5.931468114 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 3120 | 12.1% |
| 101 | 145 | 0.6% |
| 303 | 83 | 0.3% |
| 715 | 79 | 0.3% |
| 104 | 76 | 0.3% |
| 8848 | 73 | 0.3% |
| 102 | 51 | 0.2% |
| 103 | 40 | 0.2% |
| 18390 | 32 | 0.1% |
| 201 | 30 | 0.1% |
| Other values (8063) | 22022 |
| Value | Count | Frequency (%) |
| 0 | 3120 | |
| 6 | 5 | < 0.1% |
| 7 | 1 | < 0.1% |
| 14 | 5 | < 0.1% |
| 16 | 6 | < 0.1% |
| 22 | 1 | < 0.1% |
| 24 | 2 | < 0.1% |
| 26 | 1 | < 0.1% |
| 30 | 3 | < 0.1% |
| 31 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 73863211 | 1 | < 0.1% |
| 54967807 | 1 | < 0.1% |
| 54497222 | 11 | |
| 54352277 | 1 | < 0.1% |
| 47869576 | 5 | |
| 47223161 | 1 | < 0.1% |
| 43169799 | 1 | < 0.1% |
| 38202504 | 1 | < 0.1% |
| 37641013 | 4 | < 0.1% |
| 35829293 | 5 |
AGE
Real number (ℝ≥0)
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.63302396 |
| Minimum | 0 |
|---|---|
| Maximum | 10 |
| Zeros | 3 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 6 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.309948141 |
|---|---|
| Coefficient of variation (CV) | 0.3605668874 |
| Kurtosis | 0.6346197153 |
| Mean | 3.63302396 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.7775862555 |
| Sum | 93554 |
| Variance | 1.715964133 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=11)
| Value | Count | Frequency (%) |
| 3 | 8165 | |
| 4 | 6498 | |
| 2 | 5088 | |
| 5 | 3531 | |
| 6 | 1725 | 6.7% |
| 7 | 479 | 1.9% |
| 1 | 93 | 0.4% |
| 8 | 88 | 0.3% |
| 9 | 74 | 0.3% |
| 10 | 7 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 3 | < 0.1% |
| 1 | 93 | 0.4% |
| 2 | 5088 | |
| 3 | 8165 | |
| 4 | 6498 | |
| 5 | 3531 | |
| 6 | 1725 | 6.7% |
| 7 | 479 | 1.9% |
| 8 | 88 | 0.3% |
| 9 | 74 | 0.3% |
| Value | Count | Frequency (%) |
| 10 | 7 | < 0.1% |
| 9 | 74 | 0.3% |
| 8 | 88 | 0.3% |
| 7 | 479 | 1.9% |
| 6 | 1725 | 6.7% |
| 5 | 3531 | |
| 4 | 6498 | |
| 3 | 8165 | |
| 2 | 5088 | |
| 1 | 93 | 0.4% |
| Distinct | 262 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 198.1640325 |
| Minimum | 0 |
|---|---|
| Maximum | 393 |
| Zeros | 88 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 92 |
| median | 210 |
| Q3 | 295 |
| 95-th percentile | 376 |
| Maximum | 393 |
| Range | 393 |
| Interquartile range (IQR) | 203 |
Descriptive statistics
| Standard deviation | 118.263229 |
|---|---|
| Coefficient of variation (CV) | 0.596794623 |
| Kurtosis | -1.242200673 |
| Mean | 198.1640325 |
| Median Absolute Deviation (MAD) | 102 |
| Skewness | -0.1382121659 |
| Sum | 5102922 |
| Variance | 13986.19134 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 11 | 1010 | 3.9% |
| 13 | 410 | 1.6% |
| 272 | 255 | 1.0% |
| 273 | 239 | 0.9% |
| 277 | 215 | 0.8% |
| 280 | 203 | 0.8% |
| 377 | 186 | 0.7% |
| 312 | 182 | 0.7% |
| 6 | 177 | 0.7% |
| 258 | 161 | 0.6% |
| Other values (252) | 22713 |
| Value | Count | Frequency (%) |
| 0 | 88 | 0.3% |
| 5 | 152 | 0.6% |
| 6 | 177 | 0.7% |
| 7 | 83 | 0.3% |
| 8 | 86 | 0.3% |
| 11 | 1010 | |
| 12 | 84 | 0.3% |
| 13 | 410 | |
| 14 | 94 | 0.4% |
| 15 | 67 | 0.3% |
| Value | Count | Frequency (%) |
| 393 | 102 | |
| 392 | 79 | |
| 391 | 76 | |
| 390 | 77 | |
| 389 | 99 | |
| 386 | 89 | |
| 385 | 66 | |
| 384 | 66 | |
| 383 | 90 | |
| 382 | 104 |
| Distinct | 257 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.896081706 |
| Minimum | 0 |
|---|---|
| Maximum | 479 |
| Zeros | 8142 |
| Zeros (%) | 31.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 3 |
| Q3 | 6 |
| 95-th percentile | 17 |
| Maximum | 479 |
| Range | 479 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 27.30572394 |
|---|---|
| Coefficient of variation (CV) | 3.959599828 |
| Kurtosis | 136.9119409 |
| Mean | 6.896081706 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 11.00664706 |
| Sum | 177581 |
| Variance | 745.6025598 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 8142 | |
| 1 | 2395 | 9.3% |
| 2 | 2279 | 8.9% |
| 5 | 2003 | 7.8% |
| 3 | 1918 | 7.4% |
| 4 | 1759 | 6.8% |
| 6 | 1570 | 6.1% |
| 7 | 1097 | 4.3% |
| 8 | 864 | 3.4% |
| 9 | 582 | 2.3% |
| Other values (247) | 3142 | 12.2% |
| Value | Count | Frequency (%) |
| 0 | 8142 | |
| 1 | 2395 | 9.3% |
| 2 | 2279 | 8.9% |
| 3 | 1918 | 7.4% |
| 4 | 1759 | 6.8% |
| 5 | 2003 | 7.8% |
| 6 | 1570 | 6.1% |
| 7 | 1097 | 4.3% |
| 8 | 864 | 3.4% |
| 9 | 582 | 2.3% |
| Value | Count | Frequency (%) |
| 479 | 1 | |
| 450 | 1 | |
| 446 | 1 | |
| 443 | 1 | |
| 439 | 2 | |
| 438 | 1 | |
| 431 | 1 | |
| 430 | 1 | |
| 425 | 1 | |
| 424 | 1 |
| Distinct | 18497 |
|---|---|
| Distinct (%) | 71.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5770835.038 |
| Minimum | -88663278.2 |
|---|---|
| Maximum | 9364465262 |
| Zeros | 6043 |
| Zeros (%) | 23.5% |
| Negative | 2 |
| Negative (%) | < 0.1% |
| Memory size | 402.4 KiB |
Quantile statistics
| Minimum | -88663278.2 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 173.5 |
| median | 66926.02 |
| Q3 | 778552.6175 |
| 95-th percentile | 18359600.9 |
| Maximum | 9364465262 |
| Range | 9453128541 |
| Interquartile range (IQR) | 778379.1175 |
Descriptive statistics
| Standard deviation | 92210705.88 |
|---|---|
| Coefficient of variation (CV) | 15.97874576 |
| Kurtosis | 8073.687761 |
| Mean | 5770835.038 |
| Median Absolute Deviation (MAD) | 66926.02 |
| Skewness | 82.57466783 |
| Sum | 1.486047731 × 1011 |
| Variance | 8.50281428 × 1015 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 6043 | 23.5% |
| 4 | 26 | 0.1% |
| 8 | 15 | 0.1% |
| 1031 | 13 | 0.1% |
| 20 | 11 | < 0.1% |
| 12 | 10 | < 0.1% |
| 1 | 9 | < 0.1% |
| 17 | 8 | < 0.1% |
| 1043 | 8 | < 0.1% |
| 516 | 7 | < 0.1% |
| Other values (18487) | 19601 |
| Value | Count | Frequency (%) |
| -88663278.2 | 1 | < 0.1% |
| -997441 | 1 | < 0.1% |
| 0 | 6043 | |
| 1 | 9 | < 0.1% |
| 2 | 4 | < 0.1% |
| 3 | 7 | < 0.1% |
| 4 | 26 | 0.1% |
| 5 | 7 | < 0.1% |
| 6 | 4 | < 0.1% |
| 7 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 9364465262 | 1 | |
| 9233717521 | 1 | |
| 2440307900 | 1 | |
| 2381178463 | 1 | |
| 1569323372 | 1 | |
| 1566106409 | 1 | |
| 1521121594 | 1 | |
| 1498621080 | 1 | |
| 1445719427 | 1 | |
| 1381436111 | 1 |
sar_flag
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 402.4 KiB |
| 0.0 | |
|---|---|
| 1.0 | 234 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 77253 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 25517 | |
| 1.0 | 234 | 0.9% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.0 | 25517 | |
| 1.0 | 234 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 51268 | |
| . | 25751 | |
| 1 | 234 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 51502 | |
| Other Punctuation | 25751 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 51268 | |
| 1 | 234 | 0.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 25751 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 77253 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 51268 | |
| . | 25751 | |
| 1 | 234 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 77253 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 51268 | |
| . | 25751 | |
| 1 | 234 | 0.3% |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| alert_key | cust_id | risk_rank | occupation_code | total_asset | AGE | date | tx_time | amt | sar_flag | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 352249 | 3912 | 1 | 19.0 | 1465816.0 | 7 | 365 | 24.0 | 1.286508e+07 | 0.0 |
| 1 | 352253 | 5393 | 1 | 2.0 | 98177.0 | 2 | 365 | 5.0 | 2.054065e+07 | 0.0 |
| 2 | 352254 | 6924 | 1 | 19.0 | 2052922.0 | 7 | 365 | 13.0 | 1.394017e+06 | 0.0 |
| 3 | 352280 | 3431 | 3 | 15.0 | 201906.0 | 5 | 365 | 13.0 | 6.514627e+06 | 0.0 |
| 4 | 352282 | 84 | 1 | 12.0 | 7450.0 | 5 | 365 | 1.0 | 1.047400e+04 | 0.0 |
| 5 | 352291 | 3932 | 1 | 17.0 | 182242.0 | 5 | 365 | 3.0 | 2.320450e+05 | 0.0 |
| 6 | 352298 | 7637 | 3 | 12.0 | 2422.0 | 4 | 365 | 4.0 | 3.339420e+05 | 0.0 |
| 7 | 352301 | 1317 | 1 | 19.0 | 2536600.0 | 4 | 365 | 1.0 | 2.647000e+03 | 0.0 |
| 8 | 352302 | 4769 | 1 | 17.0 | 173255.0 | 4 | 365 | 4.0 | 8.643975e+05 | 0.0 |
| 9 | 352305 | 4063 | 3 | 12.0 | 876408.0 | 3 | 365 | 13.0 | 1.560263e+07 | 0.0 |
Last rows
| alert_key | cust_id | risk_rank | occupation_code | total_asset | AGE | date | tx_time | amt | sar_flag | |
|---|---|---|---|---|---|---|---|---|---|---|
| 25741 | 352111 | 3066 | 1 | 17.0 | 829412.0 | 6 | 364 | 10.0 | 169170.000 | 0.0 |
| 25742 | 352114 | 5997 | 1 | 2.0 | 444392.0 | 3 | 364 | 6.0 | 829302.045 | 0.0 |
| 25743 | 352118 | 5306 | 1 | 15.0 | 242106.0 | 4 | 364 | 5.0 | 50632.185 | 0.0 |
| 25744 | 352119 | 7209 | 1 | 13.0 | 40907.0 | 2 | 364 | 1.0 | 13719.000 | 0.0 |
| 25745 | 352120 | 3910 | 1 | 17.0 | 114439.0 | 2 | 364 | 6.0 | 4251483.815 | 0.0 |
| 25746 | 352123 | 5500 | 1 | 17.0 | 12207.0 | 2 | 364 | 1.0 | 176612.000 | 0.0 |
| 25747 | 352124 | 188 | 1 | 17.0 | 259985.0 | 4 | 364 | 1.0 | 49784.000 | 0.0 |
| 25748 | 352125 | 336 | 3 | 19.0 | 928963.0 | 3 | 364 | 1.0 | 6900402.000 | 0.0 |
| 25749 | 352128 | 7704 | 3 | 19.0 | 21647.0 | 4 | 364 | 3.0 | 15680.000 | 0.0 |
| 25750 | 352132 | 5923 | 1 | 19.0 | 3218731.0 | 3 | 364 | 2.0 | 2837081.916 | 0.0 |